26 research outputs found

    Connectivity of Random Annulus Graphs and the Geometric Block Model

    Get PDF
    We provide new connectivity results for {\em vertex-random graphs} or {\em random annulus graphs} which are significant generalizations of random geometric graphs. Random geometric graphs (RGG) are one of the most basic models of random graphs for spatial networks proposed by Gilbert in 1961, shortly after the introduction of the Erd\H{o}s-R\'{en}yi random graphs. They resemble social networks in many ways (e.g. by spontaneously creating cluster of nodes with high modularity). The connectivity properties of RGG have been studied since its introduction, and analyzing them has been significantly harder than their Erd\H{o}s-R\'{en}yi counterparts due to correlated edge formation. Our next contribution is in using the connectivity of random annulus graphs to provide necessary and sufficient conditions for efficient recovery of communities for {\em the geometric block model} (GBM). The GBM is a probabilistic model for community detection defined over an RGG in a similar spirit as the popular {\em stochastic block model}, which is defined over an Erd\H{o}s-R\'{en}yi random graph. The geometric block model inherits the transitivity properties of RGGs and thus models communities better than a stochastic block model. However, analyzing them requires fresh perspectives as all prior tools fail due to correlation in edge formation. We provide a simple and efficient algorithm that can recover communities in GBM exactly with high probability in the regime of connectivity

    METAM: Goal-Oriented Data Discovery

    Full text link
    Data is a central component of machine learning and causal inference tasks. The availability of large amounts of data from sources such as open data repositories, data lakes and data marketplaces creates an opportunity to augment data and boost those tasks' performance. However, augmentation techniques rely on a user manually discovering and shortlisting useful candidate augmentations. Existing solutions do not leverage the synergy between discovery and augmentation, thus under exploiting data. In this paper, we introduce METAM, a novel goal-oriented framework that queries the downstream task with a candidate dataset, forming a feedback loop that automatically steers the discovery and augmentation process. To select candidates efficiently, METAM leverages properties of the: i) data, ii) utility function, and iii) solution set size. We show METAM's theoretical guarantees and demonstrate those empirically on a broad set of tasks. All in all, we demonstrate the promise of goal-oriented data discovery to modern data science applications.Comment: ICDE 2023 pape
    corecore